Dataset statistics
| Number of variables | 7 |
|---|---|
| Number of observations | 1000 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 54.8 KiB |
| Average record size in memory | 56.1 B |
Variable types
| Numeric | 6 |
|---|---|
| Categorical | 1 |
X1 is highly overall correlated with X4 and 1 other fields | High correlation |
X2 is highly overall correlated with X4 and 2 other fields | High correlation |
X4 is highly overall correlated with X1 and 2 other fields | High correlation |
X5 is highly overall correlated with X1 and 1 other fields | High correlation |
X6 is highly overall correlated with X2 and 2 other fields | High correlation |
y is highly overall correlated with X2 and 2 other fields | High correlation |
Reproduction
| Analysis started | 2024-07-06 13:47:26.161168 |
|---|---|
| Analysis finished | 2024-07-06 13:47:38.176947 |
| Duration | 12.02 seconds |
| Software version | ydata-profiling v0.0.dev0 |
| Download configuration | config.json |
X1
Real number (ℝ)
HIGH CORRELATION 
| Distinct | 882 |
|---|---|
| Distinct (%) | 88.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.90022 |
| Minimum | -0.436 |
|---|---|
| Maximum | 7.356 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 1 |
| Negative (%) | 0.1% |
| Memory size | 7.9 KiB |
Quantile statistics
| Minimum | -0.436 |
|---|---|
| 5-th percentile | 2.39495 |
| Q1 | 2.867 |
| median | 3.666 |
| Q3 | 4.93625 |
| 95-th percentile | 5.8554 |
| Maximum | 7.356 |
| Range | 7.792 |
| Interquartile range (IQR) | 2.06925 |
Descriptive statistics
| Standard deviation | 1.2249992 |
|---|---|
| Coefficient of variation (CV) | 0.31408463 |
| Kurtosis | -0.7651691 |
| Mean | 3.90022 |
| Median Absolute Deviation (MAD) | 0.9665 |
| Skewness | 0.20911485 |
| Sum | 3900.22 |
| Variance | 1.5006229 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 2.897 | 4 | 0.4% |
| 3.201 | 4 | 0.4% |
| 3.167 | 3 | 0.3% |
| 2.888 | 3 | 0.3% |
| 3.355 | 3 | 0.3% |
| 2.918 | 3 | 0.3% |
| 2.691 | 3 | 0.3% |
| 2.816 | 3 | 0.3% |
| 2.784 | 3 | 0.3% |
| 4.673 | 3 | 0.3% |
| Other values (872) | 968 |
| Value | Count | Frequency (%) |
| -0.436 | 1 | |
| 0.096 | 1 | |
| 0.508 | 1 | |
| 0.783 | 1 | |
| 0.794 | 1 | |
| 0.83 | 1 | |
| 1.387 | 1 | |
| 1.46 | 1 | |
| 1.517 | 1 | |
| 1.692 | 1 |
| Value | Count | Frequency (%) |
| 7.356 | 1 | |
| 7.265 | 1 | |
| 7.21 | 1 | |
| 7.178 | 1 | |
| 6.983 | 1 | |
| 6.786 | 1 | |
| 6.616 | 1 | |
| 6.56 | 1 | |
| 6.551 | 1 | |
| 6.464 | 1 |
X2
Real number (ℝ)
HIGH CORRELATION 
| Distinct | 905 |
|---|---|
| Distinct (%) | 90.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.461672 |
| Minimum | 0.585 |
|---|---|
| Maximum | 11.225 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 7.9 KiB |
Quantile statistics
| Minimum | 0.585 |
|---|---|
| 5-th percentile | 2.0637 |
| Q1 | 3.44175 |
| median | 4.3375 |
| Q3 | 5.366 |
| 95-th percentile | 7.2396 |
| Maximum | 11.225 |
| Range | 10.64 |
| Interquartile range (IQR) | 1.92425 |
Descriptive statistics
| Standard deviation | 1.5381207 |
|---|---|
| Coefficient of variation (CV) | 0.34474088 |
| Kurtosis | 0.28856222 |
| Mean | 4.461672 |
| Median Absolute Deviation (MAD) | 0.964 |
| Skewness | 0.48742585 |
| Sum | 4461.672 |
| Variance | 2.3658154 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 3.462 | 3 | 0.3% |
| 4.509 | 3 | 0.3% |
| 4.115 | 3 | 0.3% |
| 3.457 | 3 | 0.3% |
| 4.801 | 3 | 0.3% |
| 3.975 | 2 | 0.2% |
| 6.766 | 2 | 0.2% |
| 4.243 | 2 | 0.2% |
| 3.946 | 2 | 0.2% |
| 5.098 | 2 | 0.2% |
| Other values (895) | 975 |
| Value | Count | Frequency (%) |
| 0.585 | 1 | |
| 1.27 | 1 | |
| 1.369 | 1 | |
| 1.442 | 1 | |
| 1.466 | 1 | |
| 1.506 | 1 | |
| 1.524 | 1 | |
| 1.59 | 1 | |
| 1.64 | 1 | |
| 1.666 | 1 |
| Value | Count | Frequency (%) |
| 11.225 | 1 | |
| 10.72 | 1 | |
| 9.175 | 1 | |
| 9.115 | 1 | |
| 8.833 | 1 | |
| 8.754 | 1 | |
| 8.659 | 1 | |
| 8.624 | 1 | |
| 8.593 | 1 | |
| 8.475 | 1 |
X3
Real number (ℝ)
| Distinct | 14 |
|---|---|
| Distinct (%) | 1.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 12.548 |
| Minimum | 6 |
|---|---|
| Maximum | 19 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 7.9 KiB |
Quantile statistics
| Minimum | 6 |
|---|---|
| 5-th percentile | 9 |
| Q1 | 11 |
| median | 13 |
| Q3 | 14 |
| 95-th percentile | 16 |
| Maximum | 19 |
| Range | 13 |
| Interquartile range (IQR) | 3 |
Descriptive statistics
| Standard deviation | 2.0498531 |
|---|---|
| Coefficient of variation (CV) | 0.16336095 |
| Kurtosis | -0.053380111 |
| Mean | 12.548 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | -0.0078694849 |
| Sum | 12548 |
| Variance | 4.2018979 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=14)
| Value | Count | Frequency (%) |
| 12 | 185 | |
| 14 | 182 | |
| 13 | 171 | |
| 11 | 142 | |
| 10 | 97 | |
| 15 | 85 | |
| 16 | 48 | 4.8% |
| 9 | 42 | 4.2% |
| 8 | 20 | 2.0% |
| 17 | 17 | 1.7% |
| Other values (4) | 11 | 1.1% |
| Value | Count | Frequency (%) |
| 6 | 2 | 0.2% |
| 7 | 2 | 0.2% |
| 8 | 20 | 2.0% |
| 9 | 42 | 4.2% |
| 10 | 97 | |
| 11 | 142 | |
| 12 | 185 | |
| 13 | 171 | |
| 14 | 182 | |
| 15 | 85 |
| Value | Count | Frequency (%) |
| 19 | 3 | 0.3% |
| 18 | 4 | 0.4% |
| 17 | 17 | 1.7% |
| 16 | 48 | 4.8% |
| 15 | 85 | |
| 14 | 182 | |
| 13 | 171 | |
| 12 | 185 | |
| 11 | 142 | |
| 10 | 97 |
X4
Real number (ℝ)
HIGH CORRELATION 
| Distinct | 872 |
|---|---|
| Distinct (%) | 87.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.641536 |
| Minimum | -0.433 |
|---|---|
| Maximum | 5.965 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 2 |
| Negative (%) | 0.2% |
| Memory size | 7.9 KiB |
Quantile statistics
| Minimum | -0.433 |
|---|---|
| 5-th percentile | 1.7449 |
| Q1 | 2.90775 |
| median | 3.8325 |
| Q3 | 4.3985 |
| 95-th percentile | 5.19475 |
| Maximum | 5.965 |
| Range | 6.398 |
| Interquartile range (IQR) | 1.49075 |
Descriptive statistics
| Standard deviation | 1.0556846 |
|---|---|
| Coefficient of variation (CV) | 0.2899009 |
| Kurtosis | -0.19011269 |
| Mean | 3.641536 |
| Median Absolute Deviation (MAD) | 0.648 |
| Skewness | -0.49392676 |
| Sum | 3641.536 |
| Variance | 1.1144699 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 4.47 | 4 | 0.4% |
| 3.93 | 3 | 0.3% |
| 4.554 | 3 | 0.3% |
| 3.593 | 3 | 0.3% |
| 4.263 | 3 | 0.3% |
| 2.497 | 3 | 0.3% |
| 4.223 | 3 | 0.3% |
| 4.208 | 3 | 0.3% |
| 4.231 | 3 | 0.3% |
| 4.443 | 3 | 0.3% |
| Other values (862) | 969 |
| Value | Count | Frequency (%) |
| -0.433 | 1 | |
| -0.121 | 1 | |
| 0.695 | 1 | |
| 0.725 | 1 | |
| 0.87 | 1 | |
| 0.914 | 1 | |
| 0.956 | 1 | |
| 0.99 | 1 | |
| 1.001 | 1 | |
| 1.053 | 1 |
| Value | Count | Frequency (%) |
| 5.965 | 1 | |
| 5.915 | 1 | |
| 5.899 | 1 | |
| 5.862 | 1 | |
| 5.847 | 1 | |
| 5.712 | 1 | |
| 5.696 | 1 | |
| 5.643 | 1 | |
| 5.615 | 1 | |
| 5.609 | 1 |
X5
Real number (ℝ)
HIGH CORRELATION 
| Distinct | 853 |
|---|---|
| Distinct (%) | 85.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.100318 |
| Minimum | -1.306 |
|---|---|
| Maximum | 7.638 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 3 |
| Negative (%) | 0.3% |
| Memory size | 7.9 KiB |
Quantile statistics
| Minimum | -1.306 |
|---|---|
| 5-th percentile | 2.49245 |
| Q1 | 3.393 |
| median | 3.7915 |
| Q3 | 4.95025 |
| 95-th percentile | 6.0115 |
| Maximum | 7.638 |
| Range | 8.944 |
| Interquartile range (IQR) | 1.55725 |
Descriptive statistics
| Standard deviation | 1.1465952 |
|---|---|
| Coefficient of variation (CV) | 0.27963568 |
| Kurtosis | 0.8509969 |
| Mean | 4.100318 |
| Median Absolute Deviation (MAD) | 0.611 |
| Skewness | 0.038444717 |
| Sum | 4100.318 |
| Variance | 1.3146806 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 3.732 | 5 | 0.5% |
| 3.461 | 4 | 0.4% |
| 3.271 | 4 | 0.4% |
| 3.686 | 4 | 0.4% |
| 3.255 | 3 | 0.3% |
| 3.431 | 3 | 0.3% |
| 5.482 | 3 | 0.3% |
| 3.656 | 3 | 0.3% |
| 3.457 | 3 | 0.3% |
| 3.408 | 3 | 0.3% |
| Other values (843) | 965 |
| Value | Count | Frequency (%) |
| -1.306 | 1 | |
| -0.528 | 1 | |
| -0.134 | 1 | |
| 0.134 | 1 | |
| 0.152 | 1 | |
| 0.354 | 1 | |
| 0.868 | 1 | |
| 1.055 | 1 | |
| 1.217 | 1 | |
| 1.359 | 1 |
| Value | Count | Frequency (%) |
| 7.638 | 1 | |
| 7.434 | 1 | |
| 7.39 | 1 | |
| 7.23 | 1 | |
| 7.198 | 1 | |
| 6.975 | 1 | |
| 6.928 | 1 | |
| 6.896 | 1 | |
| 6.89 | 1 | |
| 6.848 | 1 |
X6
Real number (ℝ)
HIGH CORRELATION 
| Distinct | 924 |
|---|---|
| Distinct (%) | 92.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.315121 |
| Minimum | -2.252 |
|---|---|
| Maximum | 9.796 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 10 |
| Negative (%) | 1.0% |
| Memory size | 7.9 KiB |
Quantile statistics
| Minimum | -2.252 |
|---|---|
| 5-th percentile | 0.99195 |
| Q1 | 2.2695 |
| median | 3.21 |
| Q3 | 4.35925 |
| 95-th percentile | 5.88085 |
| Maximum | 9.796 |
| Range | 12.048 |
| Interquartile range (IQR) | 2.08975 |
Descriptive statistics
| Standard deviation | 1.5522445 |
|---|---|
| Coefficient of variation (CV) | 0.46823163 |
| Kurtosis | 0.40636588 |
| Mean | 3.315121 |
| Median Absolute Deviation (MAD) | 1.0645 |
| Skewness | 0.23692111 |
| Sum | 3315.121 |
| Variance | 2.409463 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 2.066 | 3 | 0.3% |
| 5.955 | 2 | 0.2% |
| 1.49 | 2 | 0.2% |
| 3.124 | 2 | 0.2% |
| 2.7 | 2 | 0.2% |
| 5.715 | 2 | 0.2% |
| 2.348 | 2 | 0.2% |
| 1.03 | 2 | 0.2% |
| 3.336 | 2 | 0.2% |
| 1.779 | 2 | 0.2% |
| Other values (914) | 979 |
| Value | Count | Frequency (%) |
| -2.252 | 1 | |
| -1.85 | 1 | |
| -1.594 | 1 | |
| -1.328 | 1 | |
| -0.402 | 1 | |
| -0.36 | 1 | |
| -0.343 | 1 | |
| -0.228 | 1 | |
| -0.072 | 1 | |
| -0.001 | 1 |
| Value | Count | Frequency (%) |
| 9.796 | 1 | |
| 8.668 | 1 | |
| 8.401 | 1 | |
| 8.21 | 1 | |
| 8.18 | 1 | |
| 7.668 | 1 | |
| 7.606 | 1 | |
| 7.586 | 1 | |
| 7.46 | 1 | |
| 7.381 | 1 |
y
Categorical
HIGH CORRELATION 
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 7.9 KiB |
| 0 | |
|---|---|
| 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 1000 |
|---|---|
| Distinct characters | 2 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 1 |
| 3rd row | 0 |
| 4th row | 0 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 799 | |
| 1 | 201 | 20.1% |
Length
Histogram of lengths of the category
Common Values (Plot)
| Value | Count | Frequency (%) |
| 0 | 799 | |
| 1 | 201 | 20.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 799 | |
| 1 | 201 | 20.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 1000 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 799 | |
| 1 | 201 | 20.1% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 1000 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 799 | |
| 1 | 201 | 20.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 1000 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 799 | |
| 1 | 201 | 20.1% |
| X1 | X2 | X3 | X4 | X5 | X6 | y | |
|---|---|---|---|---|---|---|---|
| X1 | 1.000 | -0.363 | -0.013 | 0.828 | 0.776 | -0.090 | 0.303 |
| X2 | -0.363 | 1.000 | -0.004 | -0.785 | 0.143 | -0.853 | 0.596 |
| X3 | -0.013 | -0.004 | 1.000 | -0.008 | -0.016 | 0.009 | 0.063 |
| X4 | 0.828 | -0.785 | -0.008 | 1.000 | 0.454 | 0.388 | 0.578 |
| X5 | 0.776 | 0.143 | -0.016 | 0.454 | 1.000 | -0.608 | 0.458 |
| X6 | -0.090 | -0.853 | 0.009 | 0.388 | -0.608 | 1.000 | 0.574 |
| y | 0.303 | 0.596 | 0.063 | 0.578 | 0.458 | 0.574 | 1.000 |
A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
| X1 | X2 | X3 | X4 | X5 | X6 | y | |
|---|---|---|---|---|---|---|---|
| 0 | 3.857 | 3.065 | 12 | 4.264 | 3.560 | 4.801 | 0 |
| 1 | 4.977 | 2.168 | 14 | 5.209 | 4.343 | 4.919 | 1 |
| 2 | 2.963 | 5.089 | 10 | 2.907 | 3.401 | 3.344 | 0 |
| 3 | 3.149 | 3.715 | 13 | 3.628 | 3.095 | 4.640 | 0 |
| 4 | 6.047 | 4.877 | 14 | 4.470 | 6.361 | 1.320 | 0 |
| 5 | 5.684 | 5.253 | 14 | 4.125 | 6.138 | 1.192 | 0 |
| 6 | 4.899 | 2.171 | 13 | 5.171 | 4.267 | 4.973 | 1 |
| 7 | 2.686 | 6.721 | 13 | 2.024 | 3.710 | 1.847 | 0 |
| 8 | 4.894 | 4.136 | 16 | 4.264 | 4.962 | 2.931 | 0 |
| 9 | 3.296 | 3.189 | 13 | 3.940 | 3.052 | 5.080 | 0 |
| X1 | X2 | X3 | X4 | X5 | X6 | y | |
|---|---|---|---|---|---|---|---|
| 990 | 2.609 | 7.382 | 11 | 1.684 | 3.870 | 1.215 | 0 |
| 991 | 2.366 | 3.281 | 10 | 3.456 | 2.170 | 5.662 | 1 |
| 992 | 2.784 | 5.742 | 6 | 2.522 | 3.458 | 2.795 | 0 |
| 993 | 2.793 | 6.041 | 11 | 2.388 | 3.573 | 2.478 | 0 |
| 994 | 6.094 | 2.545 | 15 | 5.566 | 5.577 | 3.713 | 1 |
| 995 | 1.790 | 3.206 | 10 | 3.217 | 1.576 | 6.159 | 1 |
| 996 | 2.952 | 5.203 | 13 | 2.850 | 3.431 | 3.233 | 0 |
| 997 | 3.342 | 2.858 | 13 | 4.115 | 2.980 | 5.391 | 0 |
| 998 | 5.268 | 2.270 | 12 | 5.301 | 4.666 | 4.601 | 1 |
| 999 | 5.766 | 2.424 | 16 | 5.466 | 5.211 | 4.078 | 1 |